Method for classifying a time series, that includes a prescribable plurality of samples, with a computer

ABSTRACT

A method for classifying a time series, that includes a prescribable plurality of samples, with a computer wherein generalized correlation integral is determined for at least a part of samples of a time series. A functions family of an entropy function is determined from the values of the generalized correlation integral. A plurality of considered future samples is thereby employed as a family parameter of the functions family. The time series is classified into various types of characteristic processes from the curve of the functions family of the entropy function.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a method wherein it is possible to distinguish between a process characterized by a time series that describes a white noise and a Markov process, and wherein it is also possible to distinguish between a chaotic process and a chaotic process with underlying noise.

2. Description of the Prior Art

Technical fields in which it is of interest to draw conclusions about the future behavior of a time series from a measured time series can be seen, for example, in various areas of medicine. The prediction of a future course of a time series usually occurs given the assumption that the time series exhibits non-linear correlations between the samples of the time series. For example, a specific area of application within medicine is cardiology. Specifically in the problem area of sudden cardiac death, it is critical to recognize the early warning signs of sudden cardiac death in order to initiate counter-measures against the occurrence of sudden cardiac death as early as possible.

It generally represents a considerable problem to classify a measured signal, particularly an electrical signal, and its samples, for example, in a purely chaotic process, a chaotic process with underlying noise, a process with white noise or a Markov process.

For example, document [1] discloses the determination of what is referred to as a Kolmogorov entropy. Further, this document discloses that a correlation function, that is explained in greater detail later, be formed.

It is known from documents [2], [3] to classify the time series into different types of time series on the basis of correlation integrals of the samples of the time series.

In these methods, however, the problem occurs that certain types of processes and, thus, types of time series cannot be discriminated. For example, it is not possible to distinguish with these methods between a process that is characterized by a time series that describes a white noise and a Markov process.

With this method, further, it also is not possible to distinguish between a chaotic process and a chaotic process with underlying noise.

SUMMARY OF THE INVENTION

The present invention is thus directed to a method for classifying a time series with which the above-described types of time series that cannot be classified with the known methods also can be correctly distinguished and classified.

According to the method of the present invention, an electrical signal is sampled and a generalized correlation integral is determined for an arbitrary plurality of samples upon employment of preceding samples and future samples. The preceding samples and future samples relate to the sample for which the correlation integral is respectively currently intended. A functions family of an entropy function is determined from the plurality of identified values of the generalized correlation integral for the various samples. Given the functions family, an arbitrary plurality of considered, future samples is employed as a family parameter of the functions family. A partition interval quantity of a data space in which the samples can be located is employed as a run variable of the functions family. The time series is classified on the basis of the characteristic course of the functions family of the entropy function.

This method now makes it possible to also distinguish a process having [. . . ] a time series with underlying white noise from a process having a time series with the characteristics of a Markov process. A discrimination between time series with which a chaotic process is described from time series with which a chaotic process with noise is described is also possible.

A very simple and fast determination of the values of the generalized correlation integral given the consideration of a respectively medium plurality of samples that are located around the sample in an environment having a prescribable size.

A further simplification of the method occurs when the size of the environment is selected dependent on the partition interval quantity.

Further, it is advantageous in a development of the method to accelerate the classification wherein the time series is only classified into a first time series type and into a second time series type. This development of the method makes it possible to merely investigate whether, for example, the time series describes a chaotic process or a chaotic process with noise. This development leads to a considerable saving of calculating time since other specific instances of time series need not be taken into further consideration in the investigation of the time series.

The method can be employed advantageously in various technical fields; for example, when the time series is a measured cardiogram signal, a measured electroencephalogram signal or, a measured signal with which a voltage curve of brain pressure is described [sic]. A very simple and dependable classification of the individual time series and their conclusions for characteristic properties of the signals is possible for these cases.

In an further embodiment of the method, it is provided that stochastic correlations in rate curves of a financial market be determined when the time series is established by such a rate curve. In this way, it is possible to make statements about possible future rate curves of a financial market.

Additional features and advantages of the present invention are described in, and will be apparent from, the Detailed Description of the Preferred Embodiments and the Drawings.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the individual method of the present invention steps of the method in a flowchart.

FIGS. 2a-2 d show various curves of a functions family of an entropy function that allow conclusions to be drawn about the characteristics of the time series.

FIG. 3 is a block diagram that shows the various possibilities of what type the time row can be.

FIG. 4 is a sketch that shows a computer with which the method of the present invention is implemented.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows the individual method steps of the method of the present invention.

The time series is measured in a first step 101. The time series can be of an analog nature, which requires a sampling of the time series so that the time series can be processed in a computer R. When, however, the time series is already present in individual digital values, then an analog-to-digital conversion of the time series is no longer required. What type of signals the time series can represent, for example, is explained later. The time series exists digitally for the processing in the computer R; i.e., it includes a prescribable plurality of samples x, dependent on a sampling interval with which the time series is sampled.

In a second step 102, a generalized correlation integral is evaluated for at least some of the samples x_(t) of the time series, and a value c_(t) ^(n,τ,p,{haeck over (N)},ε) of the generalized correlation integral is respectively determined for a sample. A correlation integral on which the generalized correlation integral is based is known, for example, from document [1]. A sample vector x _(t) ^(n,τ,p) derives from an arbitrary plurality of samples x_(t) of the time series at a respective time t. The samples are preferably distanced from one another by a time interval T; i.e., a time interval T respectively lies between two samples x_(t) and x_(t+T).

Given the correlation integral known from document [1], it is merely provided to take the sample vector x _(t) ^(n,τ) and a partition interval quantity ε into consideration. What is to be understood by the partition interval quantity ε is the size of the subdivision of a data space in which the samples x_(t) of the time series can be located. A critical difference of the generalized correlation integral used in the method of the present invention compared to the known correlation integral may be seen wherein, given the known correlation integral, a respective maximum of one future sample with respect to the respectively currently processed sample x_(t), i.e. maximally the future sample x_(t+1), is taken into consideration in the determination of the value of the correlation integral.

Given the generalized correlation integral on which the inventive method is based, future samples x_(t+pτ) with an arbitrary p are taken into consideration in the determination of the values c_(t) ^(n,τ,p,{haeck over (N)},ε) of the generalized correlation integral, wherein p references an indication of steps to the respectively considered future sample. For example, a value c_(t) ^(n,τ,p,{haeck over (N)},ε) is derived from the generalized correlation integral according to the following rule: $\begin{matrix} {{{{\text{~~~~~~~~~}c_{t}^{n,\tau,p,\hat{N},ɛ}} = {\frac{1}{\hat{N}}{\sum\limits_{\hat{t} = 1}^{\hat{N}}{\Theta \quad \left( {\frac{ɛ}{f} - {{{\underset{\_}{x}}_{t}^{n,\tau,p} - {\underset{\_}{x}}_{\hat{t}}^{n,\tau,p}}}} \right)}}}},\text{with}}{{{\text{~~~~~~~~}{\underset{\_}{x}}_{t}^{n,\tau,p}} = \left( {x_{t},x_{t + \tau},\cdots,x_{t + {{({n - 1})}\quad \tau}},x_{t + {{({n - 1 + p})}\quad \tau}}} \right)},\text{with~~}}{{\text{~~~~~~~~~~~~}{\Theta (z)}} = \left\{ {\begin{matrix} {0\text{:}} & {z \leq 0} \\ {1\text{:}} & {z > 0} \end{matrix}\text{,~~whereby}} \right.}} & (1) \end{matrix}$

x_(t) references the sample x_(t) of the time series at the time t,

τ references the time interval that respectively lies between two samples t_(t), x_(t+1),

x_(t+τ) references a sample of the time series at a time t+τ,

n references a plurality of preceding samples taken into consideration,

p references the plurality of steps (time intervals) to the future sample taken into consideration,

x _(t) ^(n,τ,p) references the sample vector,

f references an arbitrary number,

t references a running index with which all sample vectors x _(t) ^(n,τ,p) that are taken into consideration in the respectively employed, generalized correlation integral at the times t,

N references a plurality of sample vectors x _(t) ^(n,τ,p) that are taken into -t consideration in the respectively employed, generalized correlation integral,

ε references the partition interval quantity of a data space in which the samples can be located,

Σ(z) references a heavy side function.

A second possibility for determining the generalized correlation integral can be seen, for example, in the following rule: $\begin{matrix} {{c_{t}^{n,\tau,p,\hat{N},ɛ} = {\frac{1}{\hat{N}}{\sum\limits_{\hat{t} = 1}^{\hat{N}}{\Theta \quad \left( {\frac{ɛ}{f} - {{{\underset{\_}{x}}_{t}^{n,\tau,p} - {\underset{\_}{x}}_{\hat{t}}^{n,\tau,p}}}^{m}} \right)}}}},} & (2) \end{matrix}$

wherein m references an arbitrary number. Correspondingly, an arbitrary norm is referenced ||^(m).

In a further step 103, a functions family of an entropy function h(p, ε) is determined from the values c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral for an arbitrary plurality of samples. In the determination 103, the preceding samples and the future samples with respect to the sample for which the value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral is respectively determined are taken into consideration.

The functions family of the entropy function h(p, ε) derives, for example, according to the following rule: $\begin{matrix} {{h\left( {p,ɛ} \right)} = {\frac{1}{\tau}\quad {\lim\limits_{n\rightarrow\infty}\quad {\lim\limits_{\hat{N}\rightarrow\infty}\quad {\frac{1}{\hat{N}}{\sum\limits_{t = 1}^{\hat{N}}{\log \quad {\frac{c_{t}^{n,\tau,p,\hat{N},ɛ}}{c_{t}^{{n - 1},\tau,l,\hat{N},ɛ}}.}}}}}}}} & (3) \end{matrix}$

In this rule, the plurality p of steps to the future sample is employed as a family parameter of the functions parameter of the entropy function h(p, ε). The partition interval quantity ε is introduced as a running variable in the functions family of the entropy function h(p, ε).

Graphically, the generalized correlation integrals means that the value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral respectively derives from a medium plurality of samples x_(t) that are located in an environment of a predetermined size around the sample x_(t). The value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral is respectively determined in the environment. In an embodiment of the method of the present invention, it is advantageous to prescribe the size of the environment dependent on the partition interval quantity ε.

As shown in FIGS. 2a through 2 d and explained below, the functions family of the entropy function h(p, ε) includes different curves for different types of time series. The time series can be classified in a last step 104 on the basis of the different curves of the functions family of the entropy function h(p, ε).

FIG. 2a shows a typical curve of the functions family of the entropy function h(p, ε) for a process that supplies a time series that is characterized by a white noise. Qualitatively, the functions family of the entropy function h(p, ε) for this time of time series proceeds essentially as a straight positive slope when the partition interval quantity ε in logarithmized scale |log ε| is employed as running variable. What is characteristic of this curve is that the curve is always the same for every value p.

A time series in the functions family of the entropy function h(p, ε) that describes a Markov process (see FIG. 2) exhibits a similar curve but with straight lines having a positive slope respectively shifted approximately parallel. A family of straight lines with positive slope shifted essentially parallel derive for a respectively different plurality p of future samples x_(t) taken into consideration in the curve of the functions family of the entropy function h(p, ε) for a Markov process.

Advantages of the inventive method can be seen as a result thereof compared to the known method, wherein the plurality p of steps to the future considered sample x_(t) is not taken into consideration as family parameter in a functions family of the entropy function h(p, ε). A distinction between a process with white noise and a Markov process is not possible given the known discrimination criterion upon employment of the known correlation integral.

Further, a distinction between a chaotic process and a chaotic process with noise is likewise not possible with the known method upon employment of the known correlation integral.

Due to the employment of the plurality p of steps to the future considered sample x_(t) as family parameter of the functions family of the entropy function h(p, ε), a distinction also becomes possible between these two types of processes, whose characteristic curves are shown in FIGS. 2b and 2 c. FIG. 2b describes the curve of the functions family of the entropy function h(p, ε) [. . . ]a time series that describes for a chaotic process. The curve of the functions family of the entropy function h(p, ε) is essentially characterized by substantially horizontally proceeding straight lines shifted parallel.

In contrast, FIG. 2c shows the curve of the functions family of the entropy function h(p, ε) for a time series that describes a chaotic process with noise. Except for a kink partition interval quantity ε′, this likewise shows a family of substantially horizontal straight lines shifted parallel. From the kink partition interval quantity ε′, this horizontal family of straight lines changes into a family of straight lines with positive slope shifted substantially parallel. By employing the plurality p of steps to the future considered sample x_(t) as a family parameter of the functions family of the entropy function h(p, ε), a discrimination between two different processes is again possible, which was not possible given employment of known methods.

The above-described characteristic, qualitative curves of the functions families refers to a logarithmized scale |log ε| for the partition interval quantity ε. When only the partition interval quantity ε—not logarithmized—is employed as the running variable for the functions family of the entropy function h(p, ε), then the curve of the individual functions is correspondingly changed corresponding to a delogarithmization of the partition interval quantity ε.

When a certain prior knowledge about the property of the investigated processes already exists a, for example, when a distinction is merely to be made between two types of time series, then it is advantageous in a development of the method to implement the classification only into a first time series type or into a second time series type. It is thereby possible that the first time series type describes a first time series in which a stochastic structure exists between the samples of the time series, and the second time series type describes a time series in which there is not stochastic structure between the samples of the time series. Due to this development of the method, the implementation of the method with the computer R is considerably simplified and, thus, accelerated since not all possible characteristics of curves of the functions family of the entropy function h(p, ε) have to be investigated.

FIG. 3 shows 301 various types of signals that can establish the time series. The time series can be realized 302 by an electrocardiogram signal (ECG). An advantageous employment is provided for this application since, as described in document [4], a conclusion can be drawn for a heart given the appearance of non-linear correlations between the samples of the electrocardiogram signal that this heart is at risk with respect to the occurrence of a sudden cardiac death. In the binary classification, the classification of the second time series into the first time series type thereby corresponds to a classification of the electrocardiogram signal into an electrocardiogram signal of a heart at risk with respect to sudden cardiac death. In this case, the second time series type corresponds to an electrocardiogram signal of a heart not at risk with respect to sudden cardiac death.

It is also provided that the time series can be established 303 by an electroencephalogram signal (EEG).

Further, the time series can be established by a signal that describes 304 the curve of a local oxygen voltage of a brain. Further, the time series can be established 305 by a signal that describes variable rates of a financial market; for example, in foreign exchange commerce or general stock quotations, quotations of stock indicators, etc.

FIG. 4 shows a computer 4 with which the inventive method is implemented. The computer R processes the time series registered by the measuring instrument MG and supplied to the computer R. It is thereby of no significance whether the formation of the samples from the possibly analog signal is implemented in the measuring instrument MG or in the computer R. Both versions are provided for the method. For example, the measuring instrument MG can be an electrocardiograph (ECG), an electroencephalograph (EEG) or a device that works according to the method presented in document [5]. The classification result, which is determined by the computer R in the above-described way, is further-processed in a means for further-processing WV [and] presented, for example, to a user. For example, the means WV can be a printer, a picture screen or a loudspeaker via which an acoustic or visual signal is forwarded to a user.

Although the present invention has been described with reference to specific embodiments, those of skill in the art will recognize that changes may be made thereto without departing from the spirit and scope of the invention as set forth in the hereafter appended claims.

The following publications were cited in the framework of this document:

[1] P. Grassberger and 1. Procaccia, Estimation of the Kolmogorov entropy from a chaotic signal, Physical Review A, Vol. 28, No. 4, pp. 2591 through 2593, October 1983.

[2] A. R. Osborne and A. Provenzale, Finite Correlation Dimension for Stochastic Systems with Power-Law Spectra, Physica D 35, Elsevier Science Publishers, ISBN 0167-2789, Amsterdam, pp. 357 through 381, 1989.

[3] A. Provenzale et al., Convergence of the K₂ Entropy for Random Noises with Power Law Spectra, Physica D 47, Elsevier Science Publishers, ISBN 0167-2789, Amsterdam, pp. 361 through 372, Amsterdam, 1991.

[4] G. Morfill, Komplexitätsanalyse in der Kardiologie, Physikalische Blatter, Vol. 50, No. 2, pp.156 through 160, 1994.

[5] LICOX, GMS, Gesellschaft fúr Medizinische SondenTechnik mbH, Advanced Tissue Monitoring. 

What is claimed is:
 1. A method for the classification of a time series, comprising the steps of: generating a signal representing a dynamic process; sampling, using a sampler, the generated signal, producing a prescribable plurality of samples; determining, using a computer, values c_(t) ^(n,τ,p,{circumflex over (N)},ε) of a generalized correlation integral for at least a part of the samples; determining the value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral upon employment of preceding samples and future samples; determining a functions family of an entropy function h(p, ε) from the values c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral, wherein the preceding samples and the future samples are respectively past and future samples in time with reference to the sample for which the value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral is respectively determined; employing a plurality (p) of the steps to the future considered sample as family parameter of the functions family of the entropy function h(p,ε); employing a partition interval quantity (ε) of a data space in which the samples can be located as a running variable of the functions family of the entropy function h(p,ε); and classifying the time series on the basis of the curve of the functions family of the entropy function h(p,ε).
 2. A method as claimed in claim 1, wherein the value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral is respectively derived from a medium plurality of samples that are located in an environment of predetermined size around the sample for which the value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral is respectively determined.
 3. A method as claimed in claim 2, wherein the size of the environment is dependent on the partition interval quantity (ε).
 4. A method as claimed in claim 1, wherein the respective value c_(t) ^(n,τ,p,{circumflex over (N)},ε) of the generalized correlation integral is formed according to one of the following rules: $\begin{matrix} {{{{\text{~~~~~~~~~}c_{t}^{n,\tau,p,\hat{N},ɛ}} = {\frac{1}{\hat{N}}{\sum\limits_{\hat{t} = 1}^{\hat{N}}{\Theta \quad \left( {\frac{ɛ}{f} - {{{\underset{\_}{x}}_{t}^{n,\tau,p} - {\underset{\_}{x}}_{\hat{t}}^{n,\tau,p}}}} \right)}}}},\text{with}}{{{\text{~~~~~~~~}{\underset{\_}{x}}_{t}^{n,\tau,p}} = \left( {x_{t},x_{t + \tau},\cdots,x_{t + {{({n - 1})}\quad \tau}},x_{t + {{({n - 1 + p})}\quad \tau}}} \right)},\text{with~~}}{{\text{~~~~~~~~~~~~}{\Theta (z)}} = \left\{ {\begin{matrix} {0\text{:}} & {z \leq 0} \\ {1\text{:}} & {z > 0} \end{matrix}\text{,~~whereby}} \right.}} & (1) \end{matrix}$

x_(t) references the sample x_(t) of the time series at the time t, τ references the time interval that respectively lies between two samples x_(t), x_(t+1), x_(t+τ) references a sample of the time series at a time t+τ, n references a plurality of preceding samples taken into consideration, p references the plurality of steps (time intervals) to the future sample taken into consideration, x _(t) ^(n,τ,p) references the sample vector, f references an arbitrary number, {circumflex over (t)} references a running index with which all sample vectors x _(t) ^(n,τ,p) that are taken into consideration in the respectively employed, generalized correlation integral at the times {circumflex over (t)}, {circumflex over (N)} references a plurality of sample vectors x _(t) ^(n,τ,p) that are taken into consideration in the respectively employed, generalized correlation integral, ε references the partition interval quantity of a data space in which the samples can be located, Θ(z) references a heavy side function, $\begin{matrix} {{c_{t}^{n,\tau,p,\hat{N},ɛ} = {\frac{1}{\hat{N}}{\sum\limits_{\hat{t} = 1}^{\hat{N}}{\Theta \quad \left( {\frac{ɛ}{f} - {{{\underset{\_}{x}}_{t}^{n,\tau,p} - {\underset{\_}{x}}_{\hat{t}}^{n,\tau,p}}}^{m}} \right)}}}},} & (2) \end{matrix}$

wherein m references an arbitrary number.
 5. A method as claimed in claim 1, wherein the functions family of the entropy function h(p, ε) is formed according to the following rule: $\begin{matrix} {{h\left( {p,ɛ} \right)} = {\frac{1}{\tau}\quad {\lim\limits_{n\rightarrow\infty}\quad {\lim\limits_{\hat{N}\rightarrow\infty}\quad {\frac{1}{\hat{N}}{\sum\limits_{t = 1}^{\hat{N}}{\log \quad {\frac{c_{t}^{n,\tau,p,\hat{N},ɛ}}{c_{t}^{{n - 1},\tau,l,\hat{N},ɛ}}.}}}}}}}} & (3) \end{matrix}$


6. A method as claimed in claim 1, wherein, in the classification, the time series is classified into one of a first time series type and a second time series type.
 7. A method as claimed in claim 1, wherein the first time series type describes a time series in which a stochastic structure exists between the samples of the time series, and the second time series type describes a time series in which no stochastic structure exists between the samples of the time series.
 8. A method as claimed in claim 1, wherein the partition interval quantity (ε) with logarithmized scale is employed as a running variable of the functions family.
 9. A method as claimed in claim 8, wherein the classification of the time series occurs according to at least one of the following criteria with respect to the curve of the functions family of the entropy function h(p, ε): when the curve is characterized by a substantially horizontal family of straight lines, then the time series is classified into a first time series type; when the curve is characterized by a functions family that, for smaller partition interval quantities (ε), exhibits a substantially horizontal family of straight lines and, with increasing partition interval quantities (ε), exhibits a family of straight lines with positive slope after kink partition interval quantities (ε′), then the time series is classified into a second time series type; when the curve for all future samples (p) is essentially characterized by a straight line with positive slope, then the time series is classified into a third time series type; when the curve is essentially characterized by a family of straight lines with shifted straight lines with positive slope lying substantially parallel to one another, then the time series is classified into a fourth time series type.
 10. A method as claimed in claim 1, wherein the step of generating the signal comprises the step of producing a measured electrocardiogram signal (ECG) which is used as the signal.
 11. A method as claimed in claim 1, wherein the step of generating the signal comprises them step of producing a measured electroencephalogram signal (EEG) which is used as the signal.
 12. A method as claimed in claim 1, wherein the step of generating the signal comprises the steps of: producing a voltage curve representing a brain pressure; and producing a measured signal from the voltage curve which is used as the signal.
 13. A method as claimed in claim 1, wherein the step of generating the signal comprises the step of producing a measured signal proportional to rate curves of a financial market which is used as the signal. 